20220623_10x_ZKW

analysis of scEOS

demo data(1/10~1/5 datasize)

loading 18,000 cells, expected 8,000 cells, but only called 377 cells

because of the small number of called cells:
first to try filtered matrix normally
second to try raw matrix and manually call cells using ‘UMI counts > 100’
get 3,000 EOS cells more but with very low counts, it should be fine after increasing the datasize

plus data(full datasize)
1. QC and initial analysis

cell calling >3.6k and EOS >90%
with other celltypes(Epthelium/Fibroblast/Macrophage/DC/T) together

initial clustering couldn’t separate EOS into distinct subclusters
more likely grouped by nFeature levels instead
hard to give a conclusion to match bulk RNAseq data

2. cleaning-up and re-clustering

only EOS kept
naturally grouped into Nmur1+ and Nmur1- parts
seems like five states: Nmur1+ EOS1/2, Nmur1- EOS3/4/5
check markers and then merge into three only: Nmur1+ EOS1, Nmur1- EOS2/3

3. trajectory

check monocle and RNAvelocity

source("/Shared_win/projects/RNA_normal/analysis.10x.r")
source("/Shared_win/projects/RNA_normal/analysis.r")

load SS2 data

Nmur1 P or N, DEGs using ‘FC>1.5 padj<0.01’

matrix

mat.SI_PN <- read.csv("/Shared_win/projects/202205_Nmur1EOS/final3.1/RNAseq.SS2_LY_20210608.filt_tpm.SI_Nmur1.pc_gene.csv")
rownames(mat.SI_PN) <- mat.SI_PN$gene
dim(mat.SI_PN)
## [1] 8001   12
#head(mat.SI_PN)
mat.raw <- read.csv("/Shared_win/projects/202205_Nmur1EOS/final3.1/RNAseq.SS2_LY_20210608.raw_tpm.gene.csv")
rownames(mat.raw) <- mat.raw$gene
dim(mat.raw)
## [1] 55413    47
#head(mat.raw)

DEGs

DEG_0608.SI_PN <- read.table("/Shared_win/projects/202205_Nmur1EOS/final3.1/edgeR_DEGs.SI_Nmur1.SI_P_vs_SI_N.csv", 
                           header = T, sep = ",")
rownames(DEG_0608.SI_PN) <- DEG_0608.SI_PN$gene
head(DEG_0608.SI_PN)
##          gene log2FoldChange   logCPM       LR       pvalue         padj
## Myc       Myc      -4.160795 6.152873 199.9002 2.195883e-45 1.756926e-41
## Clec4e Clec4e      -2.259573 9.068660 186.0219 2.348095e-42 9.393555e-39
## Snx10   Snx10      -2.168756 8.342083 180.7410 3.339058e-41 8.905268e-38
## H2-Q7   H2-Q7      -2.670811 8.156795 170.4181 5.995840e-39 1.199318e-35
## H2-Q6   H2-Q6      -2.820755 9.206747 169.6653 8.755237e-39 1.401013e-35
## Cxcl10 Cxcl10      -3.701292 6.957173 161.9878 4.162508e-37 5.550705e-34
##                FC
## Myc    -17.886451
## Clec4e  -4.788498
## Snx10   -4.496356
## H2-Q7   -6.367871
## H2-Q6   -7.065320
## Cxcl10 -13.007681
DEG_0608.SI_Pup <- proc_DEG(DEG_0608.SI_PN, abs = FALSE, p.cut = 0.01, FC.cut = 1.5, padj = T)$gene
DEG_0608.SI_Pup
##   [1] "Dpep2"         "Rcan1"         "Jaml"          "Cd22"         
##   [5] "Gtf2ird1"      "Cd33"          "Dio2"          "Fndc5"        
##   [9] "Lyz2"          "Cd300ld"       "Egr2"          "Nr4a2"        
##  [13] "Gpd1l"         "Myof"          "Gpr171"        "Ptgs1"        
##  [17] "Fgd2"          "Trim25"        "Cxcr2"         "Tnfrsf1a"     
##  [21] "Esam"          "Fcgr4"         "Cdh17"         "Prdm1"        
##  [25] "Isca1"         "Rgs1"          "Hdac5"         "Idi1"         
##  [29] "Slco4a1"       "Jakmip1"       "Maea"          "Nmur1"        
##  [33] "Pramef8"       "Dennd2a"       "Plscr1"        "Dok1"         
##  [37] "Slc38a1"       "Hmgcs1"        "Ank3"          "Gbp4"         
##  [41] "Mat2b"         "Csnk1e"        "Crtc3"         "Cd53"         
##  [45] "St18"          "Laptm4a"       "Msmo1"         "Insig1"       
##  [49] "Klra17"        "Plekhm3"       "Tspan13"       "Abcg2"        
##  [53] "Eepd1"         "Ly86"          "Gm20605"       "Ptger3"       
##  [57] "Il27"          "Fcrls"         "Osbpl9"        "Agmo"         
##  [61] "Zzz3"          "Grina"         "Gbp8"          "Mbnl2"        
##  [65] "Plk3"          "Ovca2"         "Cass4"         "Foxn3"        
##  [69] "Lyz1"          "Stx17"         "Slc23a2"       "Olfm4"        
##  [73] "Dnase1l3"      "Arid5b"        "Septin8"       "Trf"          
##  [77] "Slc25a13"      "Dusp16"        "AI987944"      "Snx19"        
##  [81] "St8sia4"       "F2rl3"         "Wsb2"          "Dcaf12"       
##  [85] "Rogdi"         "Rnf125"        "Slc40a1"       "Appl2"        
##  [89] "Fdft1"         "Itprip"        "Sirpb1a"       "Vav2"         
##  [93] "Ggt5"          "Ccnd2"         "Zfp512"        "Ctsl"         
##  [97] "Lgals8"        "Fxyd7"         "Pltp"          "Ddx20"        
## [101] "Golim4"        "Nr1d2"         "Epb41"         "Ttc28"        
## [105] "Elmsan1"       "Slc35a5"       "Plcxd2"        "Ffar1"        
## [109] "Pcyt2"         "Mylk"          "Slc15a4"       "Mcm6"         
## [113] "Mob3b"         "Fgl2"          "Calcoco1"      "Rexo5"        
## [117] "Ints12"        "Nfe2"          "Egr3"          "Maml2"        
## [121] "Zfp319"        "Slc39a4"       "St6galnac5"    "Cd84"         
## [125] "Ldlr"          "Htra2"         "Arrdc3"        "Trmt2b"       
## [129] "Srprb"         "Nudt4"         "Aco1"          "Mta3"         
## [133] "Arnt"          "Denn2b"        "Elmod2"        "Mpeg1"        
## [137] "Zfp318"        "Sqle"          "Adgrv1"        "Ets1"         
## [141] "BC052040"      "Slc24a3"       "Pgm2"          "Hsd17b7"      
## [145] "Vamp1"         "Trim12a"       "Srgap2"        "Gsdme"        
## [149] "Tpp2"          "Kcnip3"        "Rassf5"        "Las1l"        
## [153] "Stap1"         "Pdlim4"        "Tmem140"       "Adgrg3"       
## [157] "Cd68"          "Chd7"          "Zfp715"        "Rims3"        
## [161] "Defa17"        "Cep83"         "Ssh1"          "Pik3ip1"      
## [165] "Defa24"        "Fmo5"          "Acad11"        "Rfwd3"        
## [169] "Selenop"       "Ndc80"         "Zfp867"        "Mettl4"       
## [173] "Rasgef1b"      "Fdps"          "Stt3b"         "Tmem43"       
## [177] "Nlrp1a"        "Pgam1"         "Fasn"          "Nlrp12"       
## [181] "Dlg2"          "Ramp1"         "Aste1"         "Nek6"         
## [185] "Mfap1b"        "Lsp1"          "Golph3l"       "B3gnt7"       
## [189] "Dram2"         "Tctn3"         "Trim8"         "Stat4"        
## [193] "A130010J15Rik" "Idua"          "Galm"          "Fosl2"        
## [197] "Cytip"         "Hmox2"         "Il1r2"         "Clip1"        
## [201] "Socs2"         "Unc45a"        "Tinf2"         "Acot2"        
## [205] "Marchf7"       "Acvr1b"        "Traf7"         "Lpin1"        
## [209] "Tnfaip2"       "Gpr183"        "Irs2"          "Dtx4"         
## [213] "Krr1"          "Chd6"          "Lonp2"         "Nbr1"         
## [217] "Il13ra1"       "Tnfrsf26"      "Zswim4"        "Card11"       
## [221] "Nsun4"         "Cyp51"         "Aldh3b1"       "Csf3r"        
## [225] "Ss18"          "Zfp119b"       "Fnbp4"         "H2aw"         
## [229] "Nup155"        "Ccl4"          "Mcm7"          "Ppbp"         
## [233] "Timeless"      "Rab2b"         "Pnpo"          "Smg8"         
## [237] "Vps39"         "Dhcr7"         "Elmo2"         "Ankrd66"      
## [241] "Ace"           "Itpkb"         "Ubc"           "Hnrnpll"      
## [245] "Ercc5"         "Rrm1"          "Bex3"          "Xrcc1"        
## [249] "Notch4"        "Zfp110"        "Gab3"          "Nr4a3"        
## [253] "Tet2"          "Rcbtb2"        "Fos"           "Tgm1"         
## [257] "Frmd4b"        "Hp1bp3"        "Fam53c"        "Zscan26"      
## [261] "Hjurp"         "Ogfrl1"        "Stat2"         "Enpp5"        
## [265] "Atp8a1"        "Armcx5"        "Tbx21"         "Bbs9"         
## [269] "Nap1l5"        "Nemp2"         "Cyp1b1"        "Gm4707"       
## [273] "Zfp68"         "Copg1"         "Rapgef2"       "Gsto1"        
## [277] "Entpd1"        "Arhgef12"      "Tiam2"         "Crot"         
## [281] "Efcab14"       "Gpr55"         "Fam120b"       "Cers5"        
## [285] "Serinc5"       "Zdhhc9"        "Slc31a1"       "Gtpbp8"       
## [289] "Klhl21"        "Cln3"          "Sash1"         "Cd86"         
## [293] "Mpv17"         "Ppp1r3b"       "Pmvk"
DEG_0608.SI_Nup <- proc_DEG(DEG_0608.SI_PN, abs = FALSE, p.cut = 0.01, FC.cut = -1.5, padj = T)$gene
DEG_0608.SI_Nup
##   [1] "Myc"           "Clec4e"        "Snx10"         "H2-Q7"        
##   [5] "H2-Q6"         "Cxcl10"        "Icam1"         "Sod2"         
##   [9] "Slc2a6"        "Rps19"         "Nfkbia"        "Med11"        
##  [13] "Cs"            "Traf1"         "AB124611"      "Dusp2"        
##  [17] "C5ar1"         "Rpl10a"        "Ccl9"          "Gpr65"        
##  [21] "Clec4a2"       "Relb"          "Zc3h12a"       "Nfkbiz"       
##  [25] "Chst13"        "Itpkc"         "Olr1"          "Rpl13"        
##  [29] "Prkab2"        "Slpi"          "Tnf"           "Nfkbie"       
##  [33] "Slc11a1"       "Ehd1"          "Marcksl1"      "Rassf4"       
##  [37] "Ralgds"        "Padi4"         "Irak3"         "N4bp1"        
##  [41] "Acod1"         "Sema4d"        "Klhdc10"       "Eno1"         
##  [45] "Ier3"          "Rpl12"         "Il16"          "Cst7"         
##  [49] "Prdx5"         "Rack1"         "Nfkb2"         "Rps7"         
##  [53] "Pot1b"         "Rps10"         "Rpl8"          "Rpl5"         
##  [57] "Rpl32"         "Cd69"          "Fas"           "Gimap6"       
##  [61] "Capg"          "Rpsa"          "Slc39a1"       "Nfkbib"       
##  [65] "Rps2"          "Rps8"          "Ero1l"         "Ltb"          
##  [69] "Mapk6"         "Rps18"         "Rps15"         "Rps20"        
##  [73] "Atg16l2"       "Rpl13a"        "Pdlim7"        "Ccdc88b"      
##  [77] "Itgb7"         "Bcl3"          "Cd52"          "Trpm2"        
##  [81] "Notch1"        "Casp4"         "Rps5"          "Rplp0"        
##  [85] "Alox5ap"       "Rab32"         "Ston2"         "Ninj1"        
##  [89] "Itgal"         "Clec12a"       "Mrps18b"       "Rps15a"       
##  [93] "Rps16"         "Rnf149"        "Vasp"          "Ak2"          
##  [97] "Rps17"         "Foxn2"         "Rplp1"         "Txn1"         
## [101] "Rps3"          "C3ar1"         "Sell"          "Rps12"        
## [105] "H2-K1"         "Acp2"          "Cish"          "Rpl4"         
## [109] "Arap3"         "Emc6"          "Rpl17"         "Sec24a"       
## [113] "Emp3"          "Rnf19b"        "Rpl7"          "Adgre5"       
## [117] "Pus1"          "Dxo"           "Psme2"         "Apmap"        
## [121] "Mrpl16"        "Esrra"         "Il23a"         "Ccng2"        
## [125] "Rps27a"        "Gadd45b"       "Ifrd1"         "Dyrk2"        
## [129] "Nfat5"         "Rpl28"         "Nob1"          "Rps3a1"       
## [133] "Opa3"          "Slc29a2"       "Tnfaip3"       "Rps11"        
## [137] "Gm21188"       "Zdhhc18"       "Rps26"         "Rpl27a"       
## [141] "Mreg"          "Ggh"           "Il6"           "Rpl11"        
## [145] "Eef1g"         "Usp16"         "Rpl24"         "Tank"         
## [149] "Card19"        "Rpl6"          "Gpx1"          "Gmfg"         
## [153] "Rps4x"         "Gtf2ird2"      "Rps28"         "Tifa"         
## [157] "Ier5"          "Cd37"          "Rap1b"         "Rpl23"        
## [161] "Rpl14"         "Ctsz"          "Eif5a"         "Rpl18"        
## [165] "Cyba"          "Rpl23a"        "Map2k3"        "Herpud1"      
## [169] "S100a10"       "Mapkapk3"      "Rps6"          "Rpl21"        
## [173] "Id2"           "Txk"           "Ddit4"         "Rps24"        
## [177] "Arf4"          "Park7"         "Rplp2"         "Malt1"        
## [181] "Pilrb1"        "Rpl35"         "Hpn"           "Stk19"        
## [185] "Rps14"         "Eef1b2"        "Pa2g4"         "Rpl35a"       
## [189] "H2-Q4"         "Rpl15"         "Prg2"          "Rfx5"         
## [193] "Zeb2"          "Mapkapk2"      "Kdm6b"         "Rpl26"        
## [197] "Nop58"         "Rpl36a"        "Grcc10"        "Gpx4"         
## [201] "Zfp429"        "Serpinb1a"     "Lgals3"        "Uba52"        
## [205] "Noc2l"         "Rps23"         "Txnrd1"        "St3gal4"      
## [209] "C5ar2"         "Cst3"          "Pgk1"          "Orai2"        
## [213] "Rpl9"          "Rpl19"         "Rpl7a"         "Ssu72"        
## [217] "Sptssa"        "Rela"          "Rps25"         "Mllt6"        
## [221] "Atg9a"         "Fosl1"         "Siglecg"       "B2m"          
## [225] "Naca"          "Nampt"         "Cdk5r1"        "Cfb"          
## [229] "1700017B05Rik" "Tdg"           "Fgfr1"         "Rpl34"        
## [233] "Pi16"          "Asrgl1"        "Rpl39"         "Plek"         
## [237] "Gnai2"         "Pglyrp1"       "Nfkb1"         "Ifnar1"       
## [241] "Mtmr6"         "Snrpf"         "Tgfbi"         "Zfp593"       
## [245] "P2ry2"         "Slc38a2"       "G3bp1"         "Rab20"        
## [249] "Rpl30"         "Rpl27"         "Smdt1"         "Cxcl1"        
## [253] "Tgfb1"         "Igsf6"         "Anxa3"         "Ifngr2"       
## [257] "Bax"           "Sac3d1"        "Gna13"         "Cdc37"        
## [261] "S100a6"        "Rpl36al"       "Tpi1"          "Itgb1"        
## [265] "Serpinb2"      "Selenow"       "Birc3"         "Rpl37a"       
## [269] "Mif4gd"        "Il17ra"        "Trmt112"       "Pwp1"         
## [273] "Rpl18a"        "Tnip1"         "Rnd1"          "Tnfsf14"      
## [277] "Fau"           "Irf5"          "Osgep"         "Gm13212"      
## [281] "Nme2"          "Camk1"         "Arl13b"        "Mdh2"         
## [285] "Vdac3"         "Alas1"         "Apex1"         "Tagln2"       
## [289] "Rps13"         "Ly6e"          "Hint1"         "Pheta2"       
## [293] "Mrpl54"        "Rhov"          "Slc15a3"       "Cerk"         
## [297] "Ltv1"          "Gimap1"        "Wdr4"          "Llph"         
## [301] "Gpr84"         "Comtd1"        "Rpl41"         "Skil"         
## [305] "Sgms2"         "Aqp9"          "Prmt3"         "Dock10"       
## [309] "Sar1a"         "Tmcc3"         "Gimap5"        "Cflar"        
## [313] "Naga"          "Inpp5j"        "Ptgs2"         "Lfng"         
## [317] "Atp5h"         "Ppp1r11"       "Trp53"         "Ndufa6"       
## [321] "Krt80"         "Mrpl17"        "Mcl1"          "Ndufa13"      
## [325] "Rpl38"         "Chka"          "Rarg"          "Spata13"      
## [329] "Sec61b"        "Padi2"         "Cdc42ep2"      "Nedd4"        
## [333] "Gpr35"         "Arhgef3"       "Commd1b"       "Nme1"         
## [337] "Unc119"        "Crip1"         "Neurl3"        "Ap1s2"        
## [341] "Atp5e"         "Ttc39c"        "Zfp667"        "Rin3"         
## [345] "Irf2bp2"       "Cmtm7"         "Eef1akmt4"     "Pomp"         
## [349] "Aff1"          "Fam49b"        "Rcc2"          "Tgif1"        
## [353] "Crk"           "Sh2b2"         "Rbl2"          "Rpl37"        
## [357] "Tspo"          "Atad3a"        "Batf"          "Rps21"        
## [361] "Cox8a"         "Gnl1"          "Zc3hc1"        "Eef1e1"       
## [365] "Prelid1"       "Tsr1"          "Myl6"          "Gpatch4"      
## [369] "Nfkbid"        "Ube2e3"        "Eef1d"         "Naip5"        
## [373] "Klf10"         "Cotl1"         "Stk40"         "Dars"         
## [377] "Bst2"          "Arhgef18"      "Rrbp1"         "Vav3"         
## [381] "Pgls"          "Cox6a1"        "Rpl31"         "Rpl36"        
## [385] "Nqo1"          "Dot1l"         "Hspbp1"        "Cmpk1"        
## [389] "Agpat5"        "F10"           "Prkd3"         "H2-D1"        
## [393] "Klf7"          "Csrp1"         "Ube2f"         "Rps29"        
## [397] "Arfgef1"       "Cep85"         "Sema7a"        "Pfn1"         
## [401] "Fgfr1op"       "Fbxo11"        "Acot9"         "Flna"         
## [405] "Timm10"        "Adap1"         "Lacc1"         "Nhp2"         
## [409] "Gbp2"          "P2rx4"         "Gramd4"        "Vcl"          
## [413] "Hsd17b10"      "Rcn1"          "Sdf2"          "Ing2"         
## [417] "Mlec"          "Mrto4"         "Slc30a6"       "Stard5"       
## [421] "Il4"           "Lgmn"          "Nsa2"          "Tbc1d4"       
## [425] "Tmem63b"       "C1qbp"         "Sytl1"         "Baz1a"        
## [429] "Serbp1"        "Lmna"          "Tmem259"       "Eif3b"        
## [433] "Rbm8a"         "Taf7"          "Cox5b"         "Dennd4b"      
## [437] "Pabpn1"        "Ciapin1"       "Farsb"         "Ccdc9"        
## [441] "Katna1"        "Nadk"          "Mgst1"         "Ahnak"        
## [445] "Rpl22"         "Gpank1"        "Ebi3"          "Ten1"         
## [449] "Trem3"         "Lst1"          "Il1a"          "Ccl6"
Aname = "SI_P" 
Bname = "SI_N" 

Aidx = grep("SI_P",colnames(mat.SI_PN))
Bidx = grep("SI_N",colnames(mat.SI_PN))

Aidx.raw = grep("SI_P",colnames(mat.raw))
Bidx.raw = grep("SI_N",colnames(mat.raw))

Aidx.filt = grep("SI_P1|SI_P3|SI_P4|SI_P5|SI_P8",colnames(mat.raw))
Bidx.filt = grep("SI_N2|SI_N3|SI_N4|SI_N5|SI_N7|SI_N8",colnames(mat.raw))

## P: 0.01, FC: 1.5
## 
## up: 295
## down: 452
## total: 747

load pure obj

GEX.seur_pure <- readRDS("./scEOS.preAnno_0705.pure.rds")
GEX.seur_pure
## An object of class Seurat 
## 14783 features across 3205 samples within 1 assay 
## Active assay: RNA (14783 features, 1500 variable features)
##  3 dimensional reductions calculated: pca, tsne, umap

check

GEX.seur <- GEX.seur_pure
color.pre1 <-  ggsci::pal_igv("default")(49)[c(1,5,7,31,33,
                                               19,
                                               16,
                                               2)]
color.pre2 <- ggsci::pal_igv("default")(49)[c(7,
                                               19,
                                               16,
                                               2)]
sl_stat <- table(GEX.seur$preAnno1)
barplot(sl_stat,ylim = c(0,1100),col = color.pre1,
        main = "preAnno1 statistics, EOS subset",cex.names = 0.75)
text(x=1:8*1.2-0.45,y=sl_stat+75,paste0(sl_stat,"\n",100*round(as.numeric(sl_stat/sum(sl_stat)),4),"%"),cex = 0.75)

(DimPlot(GEX.seur_pure, reduction = "umap", group.by = "preAnno1", label = T, label.size = 3.5,repel = T,
          cols =color.pre1) + NoLegend()) +
  (DimPlot(GEX.seur_pure, reduction = "umap", group.by = "preAnno2", label = T, label.size = 3.5,repel = T,
          cols =color.pre2) + NoLegend())

re-clustering

GEX.seur <- FindVariableFeatures(GEX.seur, selection.method = "vst", nfeatures = 1500)

# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(GEX.seur), 10)

# plot variable features with and without labels
plot1 <- VariableFeaturePlot(GEX.seur)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

head(VariableFeatures(GEX.seur), 200)
##   [1] "Olfm4"         "Rsad2"         "Saa3"          "Ccl9"         
##   [5] "Gm42418"       "Slpi"          "Ptgs2"         "Hspa1a"       
##   [9] "Ifitm1"        "Hspa1b"        "S100a9"        "Malat1"       
##  [13] "Serpine1"      "Ppbp"          "Nrarp"         "Cxcl3"        
##  [17] "Fos"           "Peg10"         "Dusp1"         "Ccl4"         
##  [21] "Egr3"          "Nr4a2"         "Id2"           "Ahnak"        
##  [25] "Cxcl10"        "Acod1"         "Retnla"        "Ccl3"         
##  [29] "Il23a"         "Vcam1"         "Mt1"           "Il6"          
##  [33] "Bmp2"          "Dnaja4"        "Mcfd2"         "Thbs1"        
##  [37] "Gm17494"       "Clec4e"        "Osm"           "Rgs1"         
##  [41] "Dusp2"         "Igha"          "Phlda1"        "Thbd"         
##  [45] "Dnajb1"        "Igkc"          "Glul"          "Jun"          
##  [49] "Gata2"         "Olfr889"       "Zfp36l1"       "Tmem80"       
##  [53] "Rgcc"          "Myc"           "Terf2ip"       "Cd69"         
##  [57] "Ctsl"          "Idi1"          "Csf1"          "Klf2"         
##  [61] "Bambi"         "Hspe1"         "Il1b"          "Tnf"          
##  [65] "Fbxl5"         "Il4"           "Spp1"          "AA467197"     
##  [69] "Cmpk2"         "Hsph1"         "2810403D21Rik" "Gm6377"       
##  [73] "Bag3"          "Vegfa"         "H1f0"          "Nfkbia"       
##  [77] "Map3k8"        "Gclc"          "Jchain"        "Hs3st1"       
##  [81] "Cldn1"         "Cdkn1a"        "Hmox1"         "Olr1"         
##  [85] "Taf4b"         "Retnlg"        "Ipmk"          "Ccl6"         
##  [89] "Mrpl16"        "Klf4"          "Dio2"          "Card6"        
##  [93] "Defb40"        "Rcan1"         "Hist1h3a"      "Gm34589"      
##  [97] "Serpinb2"      "Ctsd"          "Tnfaip3"       "Dusp5"        
## [101] "Dnmt3b"        "2310040G24Rik" "Rgs2"          "Tns3"         
## [105] "Ncapg2"        "Hist1h1c"      "Tent5a"        "Kctd12"       
## [109] "Cxcl1"         "Chka"          "Ddit4"         "Kifap3"       
## [113] "Cep19"         "Cxcl9"         "Pard6g"        "Ifitm2"       
## [117] "Ptch1"         "Lrrc20"        "Nat10"         "Cxcl2"        
## [121] "Bnip3"         "Cnbd2"         "5830416I19Rik" "Ly6g5b"       
## [125] "Cyp1b1"        "Hhex"          "Cdc42bpg"      "Lmna"         
## [129] "Bola1"         "Marcks"        "Hbegf"         "Gadd45b"      
## [133] "Acad10"        "Klf5"          "AW146154"      "Tnfaip2"      
## [137] "1110019D14Rik" "Gem"           "Gpr183"        "Hint2"        
## [141] "Txndc5"        "Cited2"        "Gm32856"       "Gramd1c"      
## [145] "Rnd1"          "Gm20234"       "Pear1"         "Ctsc"         
## [149] "F10"           "Col4a2"        "Tbx2"          "Rhob"         
## [153] "Wfdc17"        "Nfkbiz"        "Hes1"          "Grm6"         
## [157] "Cyp26a1"       "B230354K17Rik" "Wdr4"          "Gm29554"      
## [161] "Tubd1"         "Pou4f1"        "Ppic"          "Ms4a6d"       
## [165] "Vgf"           "Defa22"        "4933432I03Rik" "Nr4a1"        
## [169] "Plin2"         "Hmgcs1"        "Haus7"         "Slc40a1"      
## [173] "Klrb1f"        "Hk1os"         "Gadd45g"       "Egr1"         
## [177] "Zfp932"        "Marcksl1"      "Hsp90aa1"      "Dnaja1"       
## [181] "Ccdc73"        "Bcl2l14"       "Zfp583"        "Lfng"         
## [185] "6720427I07Rik" "Hspd1"         "Egr2"          "Enah"         
## [189] "Mturn"         "Nat8l"         "Mrgpra2b"      "Resf1"        
## [193] "Defa30"        "Zfp36l2"       "Vars"          "Slc38a7"      
## [197] "0610010F05Rik" "Zfp503"        "Gm29650"       "Bcl2l11"

PCA

# exclude MT genes,  and more possible contamination genes
MT_gene <- grep("^mt-",rownames(GEX.seur),value = T)
MT_filt <- MT_gene %in% rownames(GEX.seur@assays[['RNA']]@counts)

#DIG <- grep("^Tra|^Trb|^Trg|^Trd|^Tcr|^Igm|^Igh|^Igk|^Igl|Jchain|^Hsp|^Rps|^Rpl|Hbb-|Hba-|^Dnaj|^AY|^Gm|^Hist",rownames(GEX.seur),value = T)

DIG <- grep("^Tra|^Trb|^Trg|^Trd|^Tcr|^Igm|^Igh|^Igk|^Igl|Jchain|Mzb1|Lars2|^Hsp|^Rps|^Rpl|Hbb-|Hba-|^Dnaj|Fos|Jun|^AY|^AW|^AA|^Gm|^Hist|^0|^1|^2|^3|^4|^5|^6|^7|^8|^9|Rik$",
            rownames(GEX.seur),value = T)
CC_gene <- Hmisc::capitalize(tolower(as.vector(unlist(cc.genes.updated.2019))))


GEX.seur <- RunPCA(GEX.seur, features = setdiff(VariableFeatures(object = GEX.seur),
                                                c(MT_gene,
                                                  DIG, 
                                                  CC_gene) ))
## PC_ 1 
## Positive:  Idi1, Msmo1, Ccl4, Rgs1, Ldlr, Il1r2, Cyp51, Cyp1b1, Slc40a1, Dio2 
##     Glipr1l1, Hmgcs1, Insig1, Nabp1, Fdft1, Egr2, Stard4, Hsd17b7, Ldhc, Cdh17 
##     Rhoh, Ccl3, Sc5d, Atp1b1, Elmsan1, Slc7a11, Ccr1, Fgl2, Pik3ap1, Dpep2 
## Negative:  Nfkbia, Tnfaip3, Nfkbiz, Icam1, Sod2, Ptgs2, Marcksl1, Dusp2, Clec4e, Tnf 
##     Ccl6, Olr1, Serpine1, Cish, Gpr65, Ehd1, Acod1, Rnd1, Fas, Ier2 
##     Cd69, Igsf6, Cxcl10, Ier3, Zfp36, Tagln2, Herpud1, Cdkn1a, Klf2, C5ar1 
## PC_ 2 
## Positive:  Slc7a11, Cs, Ell2, Pik3ap1, Icam1, Snx10, Basp1, Fas, Cxcl3, Sec24a 
##     Clec4e, Thbs1, Sgms2, Cd274, N4bp1, Ctsz, Sqstm1, Ralgds, Ehd1, Gabpb1 
##     Ccng2, Nfkbiz, Ldlr, Igsf6, Dusp16, Zeb2, Atp2b1, Olr1, Idi1, Brwd1 
## Negative:  Egr1, Klf2, Btg2, Ubc, Zfp36, Ier2, G0s2, Rgs2, Pmaip1, Bag3 
##     Gadd45a, Dusp1, Rhob, Tsc22d3, Tob1, Atf3, Zfp36l2, Nr4a1, Hes1, Mylip 
##     Klf6, Zfand2a, Tent5a, Adrb2, Hmox1, Plin2, Cdkn1a, Mtus1, Klf5, Crem 
## PC_ 3 
## Positive:  Ccl3, Il1a, Ftl1, Cxcl2, Hcar2, Cd274, Ccrl2, Gadd45b, Tnfaip2, Ccl4 
##     Pik3ap1, Il1b, Retnlg, Ubc, Nampt, Il1r2, Gde1, Rgs1, Igsf6, Bhlhe40 
##     Icam1, Tagap, Rab32, Ccng2, Mxd1, Nrros, Cs, Tnfaip3, Ddit3, Prdx6 
## Negative:  Ccl9, Lfng, Sell, Ahnak, Fgfr1, S100a6, AB124611, Osm, Coro2a, Cdkn1a 
##     Rgs2, Krt80, Lmna, Cd300lb, Dusp6, Rassf4, Emp3, Dusp5, Gpr35, Sema4d 
##     Vegfa, Slc11a1, Klf2, Vmn2r26, Irf2bpl, Aak1, Slc25a37, Myc, Nrarp, Isg15 
## PC_ 4 
## Positive:  Cd69, Il1a, Gadd45a, Alas1, Ffar2, P2ry13, Tnfaip2, Sod2, Cxcl2, S100a9 
##     Zfp36, Ralgds, Cs, B3gnt2, Vcam1, Tifa, Ptgs2, Gem, Zfp131, Rhoh 
##     Lbh, Arf2, Tnfaip6, Il1b, Icam1, Nfkbiz, Med21, Trem1, Rab32, Tnf 
## Negative:  Ccl9, Basp1, Fgfr1, Sell, Ccl6, Lfng, Ahnak, Ftl1, Thbs1, Sema4d 
##     Dusp1, Ctsd, Krt80, Coro2a, AB124611, Emilin2, Snx20, Tagln2, Lmna, Osm 
##     Gpr35, Cd300lb, Cdkn1a, Dusp5, Emp3, Vegfa, Tmem189, Nrros, Zeb2, Csf2rb 
## PC_ 5 
## Positive:  Retnlg, Dusp6, Thbs1, Il1r2, Cxcl3, Ccl6, Rgs2, Csrp1, Basp1, Ldhc 
##     Ccr1, P2ry10, Socs2, Mylip, Nabp1, D16Ertd472e, Alas1, Zfhx3, Stx11, Emb 
##     Cyld, Csf2rb2, B3gnt5, Osgin1, Mmp8, Gpr34, Slc16a3, Ifitm1, Plac8, Ap1s1 
## Negative:  Hmgcs1, Idi1, Ccl3, Rilpl2, Zfp36, Msmo1, Egr2, Ccl4, Fdft1, Insig1 
##     Ccl9, Stard4, Hsd17b7, Ldlr, Egr3, Ptgs2, Cyp51, Sell, Nfkbia, Cyp1b1 
##     Glipr1l1, Tagap, Bhlhe40, Ccrl2, Dusp2, Cdh17, Nr4a1, Slc20a1, Krt80, Csf1
length(setdiff(VariableFeatures(object = GEX.seur),
                                                c(MT_gene,
                                                  DIG, 
                                                  CC_gene)))
## [1] 1268
head(setdiff(VariableFeatures(object = GEX.seur),
                                                c(MT_gene,
                                                  DIG, 
                                                  CC_gene)),100)
##   [1] "Olfm4"    "Rsad2"    "Saa3"     "Ccl9"     "Slpi"     "Ptgs2"   
##   [7] "Ifitm1"   "S100a9"   "Malat1"   "Serpine1" "Ppbp"     "Nrarp"   
##  [13] "Cxcl3"    "Peg10"    "Dusp1"    "Ccl4"     "Egr3"     "Nr4a2"   
##  [19] "Id2"      "Ahnak"    "Cxcl10"   "Acod1"    "Retnla"   "Ccl3"    
##  [25] "Il23a"    "Vcam1"    "Mt1"      "Il6"      "Bmp2"     "Mcfd2"   
##  [31] "Thbs1"    "Clec4e"   "Osm"      "Rgs1"     "Dusp2"    "Phlda1"  
##  [37] "Thbd"     "Glul"     "Gata2"    "Olfr889"  "Zfp36l1"  "Tmem80"  
##  [43] "Rgcc"     "Myc"      "Terf2ip"  "Cd69"     "Ctsl"     "Idi1"    
##  [49] "Csf1"     "Klf2"     "Bambi"    "Il1b"     "Tnf"      "Fbxl5"   
##  [55] "Il4"      "Spp1"     "Cmpk2"    "Bag3"     "Vegfa"    "H1f0"    
##  [61] "Nfkbia"   "Map3k8"   "Gclc"     "Hs3st1"   "Cldn1"    "Cdkn1a"  
##  [67] "Hmox1"    "Olr1"     "Taf4b"    "Retnlg"   "Ipmk"     "Ccl6"    
##  [73] "Mrpl16"   "Klf4"     "Dio2"     "Card6"    "Defb40"   "Rcan1"   
##  [79] "Serpinb2" "Ctsd"     "Tnfaip3"  "Dusp5"    "Dnmt3b"   "Rgs2"    
##  [85] "Tns3"     "Ncapg2"   "Tent5a"   "Kctd12"   "Cxcl1"    "Chka"    
##  [91] "Ddit4"    "Kifap3"   "Cep19"    "Cxcl9"    "Pard6g"   "Ifitm2"  
##  [97] "Ptch1"    "Lrrc20"   "Nat10"    "Cxcl2"
DimPlot(GEX.seur, reduction = "pca",dims = 1:2, group.by = "orig.ident") +
  DimPlot(GEX.seur, reduction = "pca",dims = 3:4, group.by = "orig.ident")

DimPlot(GEX.seur, reduction = "pca",dims = 1:2) +
  DimPlot(GEX.seur, reduction = "pca",dims = 3:4)

DimPlot(GEX.seur, reduction = "pca",dims = 1:2,group.by = "preAnno1", cols = color.pre1) +
  DimPlot(GEX.seur, reduction = "pca",dims = 3:4,group.by = "preAnno1", cols = color.pre1)

here could learn that re-clustering would result in totally different subtypes
maybe the highly variable across all raw data including contaminated celltypes misleaded the pre-ones

DimPlot(GEX.seur, reduction = "pca",dims = 1:2,group.by = "preAnno1", split.by = "preAnno1",cols = color.pre1) +
  DimPlot(GEX.seur, reduction = "pca",dims = 3:4,group.by = "preAnno1", split.by = "preAnno1", cols = color.pre1)

DimHeatmap(GEX.seur, dims = 1:16, cells = 1500, balanced = TRUE,ncol = 4)

PC1 top features

GEX.seur@reductions$pca@feature.loadings[1:4,1:4]
##               PC_1          PC_2         PC_3         PC_4
## Olfm4  0.017930822  1.186807e-02 -0.001308903  0.016268377
## Rsad2 -0.017752653 -3.698418e-02  0.007565388  0.003095942
## Saa3  -0.009005394  5.570186e-05  0.013298256  0.006550738
## Ccl9  -0.075744917 -1.790766e-02 -0.085908611 -0.225254006
PC1.feat <- as.data.frame(GEX.seur@reductions$pca@feature.loadings) %>% arrange(desc(PC_1))
head(PC1.feat[,1:4],40)
##                PC_1         PC_2         PC_3         PC_4
## Idi1     0.13116501  0.052652721  0.064459830 -0.026359558
## Msmo1    0.12502968  0.035547236  0.040542912 -0.035080748
## Ccl4     0.11818642  0.031057064  0.114000559 -0.008791548
## Rgs1     0.11048580 -0.048634344  0.100356723 -0.073594681
## Ldlr     0.10236005  0.058393632  0.036940936 -0.043864092
## Il1r2    0.09626994  0.015998870  0.102145124 -0.050990925
## Cyp51    0.09467618 -0.001297847  0.015208247 -0.018707150
## Cyp1b1   0.09255330 -0.017337762  0.004511423  0.025439645
## Slc40a1  0.09152862 -0.060046480  0.071641052 -0.058573277
## Dio2     0.09078752 -0.058420022  0.032222609  0.015170615
## Glipr1l1 0.08927244 -0.002862569  0.015280147 -0.014808739
## Hmgcs1   0.08664243  0.047707839  0.056301673 -0.041832478
## Insig1   0.08228419  0.023626271  0.060179154 -0.025703437
## Nabp1    0.07899073 -0.044208543  0.059343008  0.037372799
## Fdft1    0.07732430  0.026775975  0.028739391 -0.026656841
## Egr2     0.07547647 -0.069924966  0.069002492  0.012855772
## Stard4   0.07466364  0.042693938  0.080481507 -0.040942743
## Hsd17b7  0.07382651  0.031977026  0.015261175 -0.018187356
## Ldhc     0.07307437 -0.046507977  0.081996896 -0.059188137
## Cdh17    0.07249662 -0.011049489  0.011406762  0.003639105
## Rhoh     0.07233359  0.005480225  0.037067362  0.049676741
## Ccl3     0.07112576  0.032234369  0.163143996 -0.024437920
## Sc5d     0.07010521  0.012058727  0.057921409 -0.067252954
## Atp1b1   0.06877144 -0.011098728  0.050921742  0.009865678
## Elmsan1  0.06872734  0.029162539  0.041282733 -0.051317002
## Slc7a11  0.06698718  0.146087833  0.067214314 -0.042927722
## Ccr1     0.06441165  0.023880324  0.074797442 -0.074219431
## Fgl2     0.06437502 -0.040139403  0.028520916 -0.040814315
## Pik3ap1  0.06315070  0.110064557  0.110217392 -0.062424961
## Dpep2    0.06300099 -0.003197856  0.002138706 -0.009165577
## Malat1   0.06293595  0.009206825  0.004069682 -0.006438979
## Slco4a1  0.05992240 -0.026516250  0.045336764 -0.018018254
## Dusp6    0.05800138  0.002102136 -0.031398287  0.008216413
## Fcrls    0.05589529 -0.033548344  0.045027989 -0.023403732
## Ifitm2   0.05472027  0.018747656  0.076243626 -0.039710284
## Fndc5    0.05454457 -0.009123465  0.015002589 -0.030540677
## Gns      0.05147718  0.015078432  0.065769534 -0.028433156
## Sh2d3c   0.05115773  0.027138226  0.055407195 -0.053551418
## Tnfaip2  0.04815093 -0.008951689  0.121704861  0.074154829
## Degs1    0.04733860 -0.001327432  0.075457165  0.026495646
tail(PC1.feat[,1:4],40)
##                 PC_1          PC_2         PC_3          PC_4
## Pmaip1   -0.06994725 -0.1385218240 -0.002026882 -0.0082436200
## Dusp1    -0.07029127 -0.1148252809  0.001924238 -0.1221484426
## Tifa     -0.07035609 -0.0153562777  0.034425144  0.0542629206
## Il6      -0.07197568  0.0011275621  0.023779568  0.0162289821
## Ccrl2    -0.07204581 -0.0060110938  0.136669050 -0.0535069069
## Slpi     -0.07301835 -0.0183106657 -0.016514537 -0.0449538863
## Snx10    -0.07411937  0.1010920489  0.071716842  0.0177859451
## Ddit4    -0.07505936 -0.0158733134 -0.017490531 -0.0613190024
## Ccl9     -0.07574492 -0.0179076555 -0.085908611 -0.2252540065
## Btg2     -0.07597167 -0.1709089825  0.036136644 -0.0107129261
## C5ar1    -0.07713393  0.0256779469 -0.014516290 -0.0458001871
## Klf2     -0.07965630 -0.2004608603 -0.024983475  0.0390528687
## Cdkn1a   -0.08046891 -0.0826729120 -0.037313836 -0.0941698148
## Herpud1  -0.08231929  0.0203577115  0.061525747 -0.0004599810
## Tagln2   -0.08345165 -0.0183168165 -0.013022175 -0.1077830659
## Zfp36    -0.08422684 -0.1525399194  0.022944648  0.0663968567
## Ier3     -0.08488010 -0.0343754741  0.034832307  0.0159548304
## Cxcl10   -0.08598735  0.0201809975  0.028190003  0.0126364097
## Igsf6    -0.08739591  0.0583020261  0.100140342  0.0408389725
## Cd69     -0.08756849  0.0203651070  0.079773668  0.1339736803
## Ier2     -0.08980315 -0.1522706228  0.019194856 -0.0191168815
## Fas      -0.09379974  0.0926573192  0.030198966 -0.0055845642
## Rnd1     -0.09474455 -0.0662602800  0.043170858  0.0394525201
## Acod1    -0.09528568  0.0498494465  0.073641959  0.0103958902
## Ehd1     -0.09968786  0.0611061767 -0.013417982 -0.0406199208
## Gpr65    -0.09983300 -0.0024150419  0.010816220 -0.0398323169
## Cish     -0.10109687  0.0005142822  0.020406934 -0.0211874879
## Serpine1 -0.10142070  0.0154618177  0.044857943 -0.0140812278
## Olr1     -0.10253967  0.0537097428  0.064051578  0.0161334125
## Ccl6     -0.10515808 -0.0363296252  0.029044283 -0.1502572559
## Tnf      -0.10626416  0.0036861260  0.041049196  0.0414425679
## Clec4e   -0.10914466  0.0755839278  0.058857334 -0.0243674394
## Dusp2    -0.11847378 -0.0039024428  0.020647541  0.0005016964
## Marcksl1 -0.12133702  0.0079185809  0.031673528  0.0354404454
## Ptgs2    -0.12372983  0.0286258542  0.066004494  0.0527630468
## Sod2     -0.12642761  0.0409568872  0.058074042  0.0697876218
## Icam1    -0.12743249  0.1084074015  0.093736443  0.0459271561
## Nfkbiz   -0.18356200  0.0598775626  0.081254670  0.0456939209
## Tnfaip3  -0.18432446 -0.0222723540  0.086080136  0.0245963330
## Nfkbia   -0.21367925  0.0162916036  0.052004602  0.0403621616
in SS2

surprisingly, here we could learn that Nmur1+/Nmur1- EOS are naturally separated by PC1 top features !

scEOS and SS2 bulk data are basically matched.

Run UMAP/tSNE

decide PCs to use
ElbowPlot(GEX.seur,ndims = 40)

PCs <- 1:10
GEX.seur <- FindNeighbors(GEX.seur, dims = PCs, k.param = 50)
## Computing nearest neighbor graph
## Computing SNN
GEX.seur <- FindClusters(GEX.seur, resolution = 0.3, method = 'igraph')
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3205
## Number of edges: 268392
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8019
## Number of communities: 3
## Elapsed time: 0 seconds
GEX.seur <- RunTSNE(GEX.seur, dims=PCs, complexity=20)
GEX.seur <- RunUMAP(GEX.seur, dims=PCs, n.neighbors = 50)
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 17:51:04 UMAP embedding parameters a = 0.9922 b = 1.112
## 17:51:04 Read 3205 rows and found 10 numeric columns
## 17:51:04 Using Annoy for neighbor search, n_neighbors = 50
## 17:51:04 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 17:51:05 Writing NN index file to temp file C:\Users\Shaorui\AppData\Local\Temp\Rtmp63OVnz\file6ea4600d3248
## 17:51:05 Searching Annoy index using 1 thread, search_k = 5000
## 17:51:06 Annoy recall = 100%
## 17:51:07 Commencing smooth kNN distance calibration using 1 thread
## 17:51:08 Initializing from normalized Laplacian + noise
## 17:51:08 Commencing optimization for 500 epochs, with 203390 positive edges
## 17:51:18 Optimization finished
DimPlot(GEX.seur, reduction = "tsne", label = T,group.by = "seurat_clusters") + DimPlot(GEX.seur, reduction = "umap", label = T,group.by = "seurat_clusters")

DimPlot(GEX.seur, reduction = "tsne", label = T,group.by = "preAnno1",split.by = "preAnno1",cols = color.pre1) + 
  DimPlot(GEX.seur, reduction = "umap", label = T,group.by = "preAnno1",split.by = "preAnno1",cols = color.pre1)

GEX.seur$Anno1 <- as.character(GEX.seur$seurat_clusters)

GEX.seur$Anno1[GEX.seur$Anno1 %in% c(1)] <- "EOS1"
GEX.seur$Anno1[GEX.seur$Anno1 %in% c(0)] <- "EOS2"
GEX.seur$Anno1[GEX.seur$Anno1 %in% c(2)] <- "EOS3"

GEX.seur$Anno1 <- factor(GEX.seur$Anno1,
                         levels = paste0("EOS",1:3))
color.A1 <- ggsci::pal_igv("default")(49)[c(2,
                                            33,44)]
  
#color.A2 <- 
DimPlot(GEX.seur, reduction = "tsne", label = T,group.by = "Anno1",cols = color.A1) + DimPlot(GEX.seur, reduction = "umap", label = T,group.by = "Anno1",cols = color.A1)

cowplot::plot_grid(
AugmentPlot(DimPlot(GEX.seur, reduction = "tsne", label = T,group.by = "Anno1",cols = color.A1, pt.size = 3.2, repel = T, label.size = 12)),
  AugmentPlot(DimPlot(GEX.seur, reduction = "umap", label = T,group.by = "Anno1",cols = color.A1, pt.size = 3.2)) ,
  (DimPlot(GEX.seur, reduction = "umap", label = F,group.by = "Anno1",cols = color.A1, pt.size = 0,cells = c(1:50))+ theme(plot.margin = unit(c(0,1,0,2),'cm')) +
     labs(x="",y="",title = "") + theme(axis.line.x = element_blank(),
                                        axis.ticks.x = element_blank(),
                                        axis.text.x = element_blank(),
                                        axis.line.y = element_blank(),
                                        axis.ticks.y = element_blank(),
                                        axis.text.y = element_blank())),
ncol = 3, rel_widths = c(4.5,4.5,1.8))

pp.umap2 <- cowplot::plot_grid(
AugmentPlot(DimPlot(GEX.seur, reduction = "tsne", label = F,group.by = "Anno1",cols = color.A1, pt.size = 3.2, repel = T, label.size = 12)),
  AugmentPlot(DimPlot(GEX.seur, reduction = "umap", label = F,group.by = "Anno1",cols = color.A1, pt.size = 3.2)) ,
  get_legend(DimPlot(GEX.seur, reduction = "umap", label = F,group.by = "Anno1",cols = color.A1)),
ncol = 3, rel_widths = c(4.5,4.5,1))
pp.umap2

pp.umap1 <- cowplot::plot_grid(
AugmentPlot(DimPlot(GEX.seur, reduction = "tsne", label = F,group.by = "seurat_clusters", pt.size = 3.2, repel = T, label.size = 12)),
  AugmentPlot(DimPlot(GEX.seur, reduction = "umap", label = F,group.by = "seurat_clusters", pt.size = 3.2)) ,
  get_legend(DimPlot(GEX.seur, reduction = "umap", label = F,group.by = "seurat_clusters")),
ncol = 3, rel_widths = c(4.5,4.5,1))
pp.umap1

sl_stat <- table(GEX.seur$Anno1)
barplot(sl_stat,ylim = c(0,2100),col = color.A1,
        main = "Anno1 statistics, EOS",cex.names = 0.75)
text(x=1:3*1.2-0.45,y=sl_stat+155,paste0(sl_stat,"\n",100*round(as.numeric(sl_stat/sum(sl_stat)),4),"%"),cex = 0.75)

markers.SItop <- read.csv("J:/projects_local/projects/20210608_SS2_LY/mod_20220117/tissues_comp/TissueSpecific/multi_plot/Specificity_plot.SI_top1k.mod20220530.csv", row.names = 1)
DotPlot(GEX.seur, features = rev(markers.SItop[1:20,]$gene), group.by = "Anno1", cols = c("lightgrey","red")) + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6)) + labs(title="Anno1, SI EOS top20 unique")

DimPlot(GEX.seur, reduction = "tsne", label = T,group.by = "Anno1",split.by = "Anno1",cols = color.A1) + 
  DimPlot(GEX.seur, reduction = "umap", label = T,group.by = "Anno1",split.by = "Anno1",cols = color.A1)

DimPlot(GEX.seur, reduction = "pca",dims = 1:2,group.by = "Anno1", cols = color.A1) +
  DimPlot(GEX.seur, reduction = "pca",dims = 3:4,group.by = "Anno1", cols = color.A1)

DimPlot(GEX.seur, reduction = "pca",dims = 1:2,group.by = "Anno1", split.by = "Anno1",cols = color.A1) +
  DimPlot(GEX.seur, reduction = "pca",dims = 3:4,group.by = "Anno1", split.by = "Anno1", cols = color.A1)

recheck QC

VlnPlot(GEX.seur, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0.1, group.by = "preAnno1",cols =color.pre1)

VlnPlot(GEX.seur, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0.1, group.by = "Anno1",cols =color.A1)

plota <- FeatureScatter(GEX.seur, feature1 = "nCount_RNA", feature2 = "percent.mt", group.by = "preAnno1",cols =color.pre1) 
             #coord_cartesian(xlim =c(0, 40000), ylim = c(0, 40))
plotb <- FeatureScatter(GEX.seur, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", group.by = "preAnno1",cols =color.pre1)
             #coord_cartesian(xlim =c(0, 40000), ylim = c(0, 4000))
plota + plotb

plota <- FeatureScatter(GEX.seur, feature1 = "nCount_RNA", feature2 = "percent.mt", group.by = "Anno1",cols =color.A1) 
             #coord_cartesian(xlim =c(0, 40000), ylim = c(0, 40))
plotb <- FeatureScatter(GEX.seur, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", group.by = "Anno1",cols =color.A1)
             #coord_cartesian(xlim =c(0, 40000), ylim = c(0, 4000))
plota + plotb

markers

check sig score

score.SI_Nup <- calculate_signature_score(as.data.frame(GEX.seur@assays[['RNA']]@data),
                                          DEG_0608.SI_Nup)
## Summarizing data
score.SI_Pup <- calculate_signature_score(as.data.frame(GEX.seur@assays[['RNA']]@data),
                                          DEG_0608.SI_Pup)
## Summarizing data
GEX.seur <- AddMetaData(GEX.seur,
                        score.SI_Nup,
                        "score.SI_Nup")
GEX.seur <- AddMetaData(GEX.seur,
                        score.SI_Pup,
                        "score.SI_Pup")
vln.score.SI_Nup <- GEX.seur@meta.data %>%
    ggplot(aes(Anno1, score.SI_Nup, color = Anno1)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("EOS1","EOS2"),
                                                  c("EOS2","EOS3"),
                                                  c("EOS1","EOS3")),size=2
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
     scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI Nmur1_PvsN, Nup") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6),
        plot.title = element_text (size = 7.5))

vln.score.SI_Pup <- GEX.seur@meta.data %>%
    ggplot(aes(Anno1, score.SI_Pup, color = Anno1)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("EOS2","EOS3"),
                                                  c("EOS1","EOS2"),
                                                  c("EOS1","EOS3")),size=2
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
    scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI Nmur1_PvsN, Pup") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6),
        plot.title = element_text (size = 7.5))
#vln.score.SI_Pup
#+ NoLegend()
cowplot::plot_grid(
  vln.score.SI_Nup,
  vln.score.SI_Pup,
  ncol = 2
)

#markers.SItop <- read.csv("I:/Shared_win/projects/20210608_SS2_LY/mod_20220117/tissues_comp/TissueSpecific/multi_plot/Specificity_plot.SI_top1k.mod20220530.csv", row.names = 1)
  
markers.SItop[1:5,]
##            gene log2FoldChange   logCPM        LR        pvalue          padj
## Klra17   Klra17       7.151111 2.442868  74.39249  6.403223e-18  1.853922e-15
## Nmur1     Nmur1       6.925340 2.609010  87.50544  8.404842e-21  3.447386e-18
## Cdh17     Cdh17       8.412771 5.002555 365.78156  1.551294e-81  5.090313e-78
## Clec4a4 Clec4a4       8.205885 5.297865 466.87132 1.536726e-103 7.563767e-100
## Dpep2     Dpep2       8.042375 5.978916 531.25372 1.507289e-117 1.483775e-113
##               FC entropy SI.tpm
## Klra17  142.1343  0.0000  23.07
## Nmur1   121.5445  0.0532  26.35
## Cdh17   340.7975  0.1132  58.38
## Clec4a4 295.2688  0.1527 486.93
## Dpep2   263.6307  0.1732 221.24
markers.SItop[1:20,]$gene
##  [1] "Klra17"  "Nmur1"   "Cdh17"   "Clec4a4" "Dpep2"   "Ffar1"   "F2rl3"  
##  [8] "Sirpb1a" "Cyp1b1"  "Cd22"    "Jaml"    "Dio2"    "Hcar2"   "Dpep3"  
## [15] "Ptger3"  "Gpr171"  "Jchain"  "Ccl4"    "Ggta1"   "Abi3"
markers.SItop[1:40,]$gene
##  [1] "Klra17"        "Nmur1"         "Cdh17"         "Clec4a4"      
##  [5] "Dpep2"         "Ffar1"         "F2rl3"         "Sirpb1a"      
##  [9] "Cyp1b1"        "Cd22"          "Jaml"          "Dio2"         
## [13] "Hcar2"         "Dpep3"         "Ptger3"        "Gpr171"       
## [17] "Jchain"        "Ccl4"          "Ggta1"         "Abi3"         
## [21] "Gngt2"         "Il1r2"         "Gtf2ird1"      "Clec4b2"      
## [25] "Esam"          "Acp5"          "Ahrr"          "Hepacam2"     
## [29] "Gbp4"          "Gbp8"          "Pdlim4"        "Piwil4"       
## [33] "Fcgr4"         "P2ry13"        "Pilra"         "Adamdec1"     
## [37] "Chac1"         "2510009E07Rik" "Plscr1"        "Clec4b1"
tail(markers.SItop[1:100,])
##            gene log2FoldChange   logCPM        LR       pvalue         padj
## Galm       Galm       3.384603 3.781498  47.49677 5.509473e-12 6.093849e-10
## Ppfibp2 Ppfibp2       3.909049 4.126379  69.87176 6.328795e-17 1.639491e-14
## Cxcr2     Cxcr2       3.461298 3.725665  28.22164 1.081881e-07 4.400842e-06
## Cd300c   Cd300c       3.176197 1.723846  15.00855 1.070252e-04 1.563140e-03
## Adgrv1   Adgrv1       3.605908 3.533204  73.27722 1.126606e-17 2.997380e-15
## Gns         Gns       3.424218 6.301935 153.19273 3.476587e-35 3.422353e-32
##                FC entropy SI.tpm
## Galm    10.444000  1.6580  24.58
## Ppfibp2 15.022455  1.6591  23.31
## Cxcr2   11.014242  1.6632  21.06
## Cd300c   9.039209  1.6684  13.77
## Adgrv1  12.175494  1.6763  35.26
## Gns     10.734762  1.6816 154.77
vln.Anno1_SIu20 <- DotPlot(GEX.seur, features = rev(markers.SItop[1:20,]$gene), group.by = "Anno1")  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6)) + labs(title="Anno1, SI EOS top20 unique")
vln.Anno1_SIu20

dim(markers.SItop)
## [1] 262   9
score.SIu20 <- calculate_signature_score(as.data.frame(GEX.seur@assays[['RNA']]@data),
                                          markers.SItop[1:20,]$gene)
## Summarizing data
score.SIu40 <- calculate_signature_score(as.data.frame(GEX.seur@assays[['RNA']]@data),
                                          markers.SItop[1:40,]$gene)
## Summarizing data
score.SIu100 <- calculate_signature_score(as.data.frame(GEX.seur@assays[['RNA']]@data),
                                          markers.SItop[1:100,]$gene)
## Summarizing data
score.SIu262 <- calculate_signature_score(as.data.frame(GEX.seur@assays[['RNA']]@data),
                                          markers.SItop[,]$gene)
## Summarizing data
GEX.seur <- AddMetaData(GEX.seur,
                        score.SIu20,
                        "score.SI_Uniq20")
GEX.seur <- AddMetaData(GEX.seur,
                        score.SIu20,
                        "score.SI_Uniq40")

GEX.seur <- AddMetaData(GEX.seur,
                        score.SIu100,
                        "score.SI_Uniq100")
GEX.seur <- AddMetaData(GEX.seur,
                        score.SIu262,
                        "score.SI_Uniq262")
vln.score.SIu20 <- GEX.seur@meta.data %>%
    ggplot(aes(Anno1, score.SI_Uniq20, color = Anno1)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("EOS2","EOS3"),
                                                  c("EOS1","EOS2"),
                                                  c("EOS1","EOS3")),size=2
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
     scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI-EOS unique, top20") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6),
        plot.title = element_text (size = 7.5))

vln.score.SIu40 <- GEX.seur@meta.data %>%
    ggplot(aes(Anno1, score.SI_Uniq40, color = Anno1)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("EOS2","EOS3"),
                                                  c("EOS1","EOS2"),
                                                  c("EOS1","EOS3")),size=2
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
    scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI-EOS unique, top40") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6),
        plot.title = element_text (size = 7.5))

vln.score.SIu100 <- GEX.seur@meta.data %>%
    ggplot(aes(Anno1, score.SI_Uniq100, color = Anno1)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("EOS2","EOS3"),
                                                  c("EOS1","EOS2"),
                                                  c("EOS1","EOS3")),size=2
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
    scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI-EOS unique, top100") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6),
        plot.title = element_text (size = 7.5))

vln.score.SIu262 <- GEX.seur@meta.data %>%
    ggplot(aes(Anno1, score.SI_Uniq262, color = Anno1)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("EOS2","EOS3"),
                                                  c("EOS1","EOS2"),
                                                  c("EOS1","EOS3")),
                               size=2
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
    scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI-EOS unique 262") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6),
        plot.title = element_text (size = 7.5))
#vln.score.SI_Pup
#+ NoLegend()
cowplot::plot_grid(
  vln.score.SIu20,
  vln.score.SIu40,
  ncol = 2
)

cowplot::plot_grid(
  vln.score.SIu100,
  vln.score.SIu262,
  ncol = 2
)

check markers

vln.Anno1 <- DotPlot(GEX.seur, features = rev(c("Klra17",
                                   "Nmur1",
                                   "Cdh17","Clec4a4","Dpep2","Ffar1",
                                   "F2rl3","Sirpb1a","Cyp1b1","Cd22",
                                   "Jaml","Dio2","Hcar2","Dpep3",
                                   "Ptger3","Gpr171")), group.by = "Anno1")  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6)) + labs(title="Anno1, SI EOS top16 unique")
vln.Anno1

vln.Anno1_Pup40 <- DotPlot(GEX.seur, features = rev(DEG_0608.SI_Pup[1:40]), group.by = "Anno1")  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6)) + labs(title="Anno1, SI Nmur1 Pup top40")
vln.Anno1_Pup40

vln.Anno1_Nup40 <- DotPlot(GEX.seur, features = rev(DEG_0608.SI_Nup[1:40]), group.by = "Anno1")  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6)) + labs(title="Anno1, SI Nmur1 Nup top40")
vln.Anno1_Nup40

vln.Anno1_Pup40 + vln.Anno1_Nup40

violin

pp.vln1 <- VlnPlot(GEX.seur, features = c("Klra17",
                                   "Nmur1",
                                   "Cdh17","Clec4a4","Dpep2","Ffar1",
                                   "F2rl3","Sirpb1a","Cyp1b1","Cd22",
                                   "Jaml","Dio2","Hcar2","Dpep3",
                                   "Ptger3","Gpr171"), group.by = "Anno1", pt.size = 0.05)  + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

#pp.vln1 <- lapply(pp.vln1,function(vv){
#   vv + geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
#    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
#    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
#    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
#                               method = "wilcox.test",
#                               comparisons = list(c("EOS1","EOS2"),
#                                                  c("EOS2","EOS3"),
#                                                  c("EOS1","EOS3")),
#                               ) +
#     scale_color_manual(values = color.A1) + NoLegend()+ 
#  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))
#  
#})
#cowplot::plot_grid(plotlist = pp.vln1, ncol = 4)

pp.vln1

#pp.vln1

feature

FeaturePlot(GEX.seur, features = c("Klra17",
                                   "Nmur1",
                                   "Cdh17","Clec4a4","Dpep2","Ffar1",
                                   "F2rl3","Sirpb1a","Cyp1b1","Cd22",
                                   "Jaml","Dio2","Hcar2","Dpep3",
                                   "Ptger3","Gpr171"), pt.size = 0.75)

FeaturePlot(GEX.seur, features = DEG_0608.SI_Pup[1:40])

FeaturePlot(GEX.seur, features = DEG_0608.SI_Nup[1:40])

VlnPlot(GEX.seur, features = DEG_0608.SI_Pup[1:40], group.by = "Anno1", cols = color.A1)  + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

VlnPlot(GEX.seur, features = DEG_0608.SI_Nup[1:40], group.by = "Anno1", cols = color.A1)  + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

all markers

# find markers for every cluster compared to all remaining cells, report only the positive ones
Idents(GEX.seur) <- "Anno1"

#GEX.markers.pre <- FindAllMarkers(GEX.seur, only.pos = T, min.pct = 0.05,
#                                  test.use = "MAST",
#                                  logfc.threshold = 0.1)
GEX.markers.pre <- read.table("10x_ZKW_GEX.markers.pure_Anno1.0719new.csv", header = TRUE, sep = ",")
GEX.markers.pre$gene <- gsub(" ","",GEX.markers.pre$gene, fixed = TRUE)
GEX.markers.pre %>% group_by(cluster) %>% top_n(n = 8, wt = avg_log2FC)
## # A tibble: 24 x 7
## # Groups:   cluster [3]
##        p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene   
##        <dbl>      <dbl> <dbl> <dbl>     <dbl> <chr>   <chr>  
##  1 5.31e- 93       1.38 0.721 0.469 7.85e- 89 EOS1    Rgs1   
##  2 1.73e- 69       1.43 0.414 0.171 2.56e- 65 EOS1    Dio2   
##  3 4.46e- 65       1.32 0.45  0.203 6.59e- 61 EOS1    Slc40a1
##  4 1.64e- 60       1.39 0.351 0.142 2.43e- 56 EOS1    Msmo1  
##  5 2.10e- 52       1.34 0.421 0.216 3.10e- 48 EOS1    Idi1   
##  6 6.36e- 50       1.24 0.169 0.026 9.40e- 46 EOS1    Cyp1b1 
##  7 2.99e- 47       1.13 0.502 0.302 4.42e- 43 EOS1    Cyp51  
##  8 4.59e- 43       1.14 0.291 0.116 6.78e- 39 EOS1    Egr2   
##  9 1.62e-242       1.84 0.853 0.364 2.40e-238 EOS2    Nfkbiz 
## 10 2.25e-231       1.58 0.88  0.379 3.32e-227 EOS2    Nfkbia 
## # ... with 14 more rows
GEX.markers.pre$cluster <- factor(as.character(GEX.markers.pre$cluster),
                          levels = levels(GEX.seur$Anno1))

markers.pre_t8 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.25 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.1 & !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 8, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC*pct.1),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t16 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.25 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.1 & !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 16, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC*pct.1),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t32 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.1 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.1 & !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 32, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC*pct.1),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t48 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.1 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.1 & !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 48, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC*pct.1),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t100 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.1 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.05 & !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 100, wt = avg_log2FC/p_val_adj) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC/p_val_adj),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t64 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.1 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.1 & !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 64, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC*pct.1),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t64 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.1 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.05 & avg_log2FC >0.1 &   !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 64, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC*pct.1),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene

markers.pre_t300 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                  #filter(pct.1>0.1 & !(gene %in% c(MT_gene,DIG, CC_gene))) %>%
                    filter(pct.1>0.05 & avg_log2FC >0.3 & p_val_adj < 0.01 &  !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   top_n(n = 210, wt = avg_log2FC) %>%
                   ungroup() %>%
  arrange(desc(avg_log2FC),gene) %>%
                             distinct(gene, .keep_all = TRUE) %>%
                             arrange(cluster,p_val_adj))$gene
DotPlot(GEX.seur, features = rev(markers.pre_t64[1:64]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

DotPlot(GEX.seur, features = rev(markers.pre_t64[65:128]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

DotPlot(GEX.seur, features = rev(markers.pre_t64[129:184]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

#DotPlot(GEX.seur, features = rev(markers.pre_t64[193:256]))  + coord_flip() + 
#  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))
#DotPlot(GEX.seur, features = rev(markers.pre_t64[257:320]))  + coord_flip() + 
#  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))
"Nmur1" %in% markers.pre_t100
## [1] TRUE
DotPlot(GEX.seur, features = rev(markers.pre_t100[1:56]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

DotPlot(GEX.seur, features = rev(markers.pre_t100[57:112]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

DotPlot(GEX.seur, features = rev(markers.pre_t100[113:168]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

DotPlot(GEX.seur, features = rev(markers.pre_t100[169:224]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

DotPlot(GEX.seur, features = rev(markers.pre_t100[225:279]))  + coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

FeaturePlot(GEX.seur,features = markers.pre_t100[1:56])

FeaturePlot(GEX.seur,features = markers.pre_t100[57:112])

FeaturePlot(GEX.seur,features = markers.pre_t100[113:168])

FeaturePlot(GEX.seur,features = markers.pre_t100[169:224])

FeaturePlot(GEX.seur,features = markers.pre_t100[225:279])

GEX.seur@meta.data[,grep("snn|ANN",colnames(GEX.seur@meta.data))] <- NULL
head(GEX.seur@meta.data)
##                    orig.ident nCount_RNA nFeature_RNA percent.mt     S.Score
## AAACCCAGTAACATAG-1   EOS_plus       3031         1249  2.0785219 -0.08458732
## AAACCCATCCTCTAAT-1   EOS_plus       2369          804  0.7598143 -0.05132641
## AAACGAAAGTGCGACA-1   EOS_plus       2395         1135  3.3820459  0.02400687
## AAACGAACATGACAAA-1   EOS_plus       1885          939  0.0530504  0.16220058
## AAACGAACATGGCCCA-1   EOS_plus       2316         1038  0.9930915 -0.06670958
## AAACGAAGTTATTCTC-1   EOS_plus        933          552  0.8574491 -0.03460208
##                      G2M.Score Phase seurat_clusters preAnno1 preAnno2
## AAACCCAGTAACATAG-1 -0.03475332    G1               0     EOS2      EOS
## AAACCCATCCTCTAAT-1 -0.05992585    G1               0     EOS2      EOS
## AAACGAAAGTGCGACA-1 -0.04476544     S               1     EOS2      EOS
## AAACGAACATGACAAA-1  0.14071103     S               1     EOS4      EOS
## AAACGAACATGGCCCA-1 -0.05487850    G1               0     EOS4      EOS
## AAACGAAGTTATTCTC-1 -0.07295720    G1               0     EOS5      EOS
##                    DoubletFinder0.05 DoubletFinder0.1 score.SI_Nup score.SI_Pup
## AAACCCAGTAACATAG-1           Singlet          Singlet    0.4299480  -0.01772511
## AAACCCATCCTCTAAT-1           Singlet          Singlet    0.4119714  -0.05334092
## AAACGAAAGTGCGACA-1           Singlet          Singlet    0.2297882   0.01743818
## AAACGAACATGACAAA-1           Singlet          Singlet    0.2546274   0.02906793
## AAACGAACATGGCCCA-1           Singlet          Singlet    0.4407920  -0.04720992
## AAACGAAGTTATTCTC-1           Singlet          Singlet    0.3183110  -0.08096872
##                    Anno1 score.SI_Uniq20 score.SI_Uniq40 score.SI_Uniq100
## AAACCCAGTAACATAG-1  EOS2      0.26506193      0.26506193      0.179451027
## AAACCCATCCTCTAAT-1  EOS2      0.15362436      0.15362436     -0.056836599
## AAACGAAAGTGCGACA-1  EOS1      0.02957651      0.02957651     -0.034962058
## AAACGAACATGACAAA-1  EOS1      0.19356173      0.19356173      0.158949496
## AAACGAACATGGCCCA-1  EOS2     -0.11463268     -0.11463268      0.004054466
## AAACGAAGTTATTCTC-1  EOS2     -0.02863820     -0.02863820     -0.092051949
##                    score.SI_Uniq262
## AAACCCAGTAACATAG-1       0.13997750
## AAACCCATCCTCTAAT-1      -0.01142811
## AAACGAAAGTGCGACA-1       0.03978634
## AAACGAACATGACAAA-1       0.16699518
## AAACGAACATGGCCCA-1       0.05485604
## AAACGAAGTTATTCTC-1      -0.03272956
0722add

FC > 1.2, padj< 0.01

markers.0722 <- (GEX.markers.pre %>% group_by(cluster) %>% 
                    filter(pct.1>0.05 & 
                             avg_log2FC > log2(1.2) & p_val_adj<0.01 &
                             !(gene %in% grep("^Rps|^Rpl|^Igm|^Igh|^Igk|^Igl|Jchain",rownames(GEX.seur),value = T))) %>%
                   ungroup() %>%
                             arrange(cluster,p_val_adj))
markers.0722
## # A tibble: 594 x 7
##       p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene    
##       <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>   
##  1 5.31e-93      1.38  0.721 0.469  7.85e-89 EOS1    Rgs1    
##  2 1.73e-69      1.43  0.414 0.171  2.56e-65 EOS1    Dio2    
##  3 1.01e-68      0.691 0.913 0.763  1.50e-64 EOS1    Adamdec1
##  4 4.46e-65      1.32  0.45  0.203  6.59e-61 EOS1    Slc40a1 
##  5 1.10e-61      0.877 0.885 0.77   1.62e-57 EOS1    Nabp1   
##  6 1.56e-61      0.873 0.724 0.548  2.30e-57 EOS1    Tspan13 
##  7 1.64e-60      1.39  0.351 0.142  2.43e-56 EOS1    Msmo1   
##  8 1.42e-55      0.963 0.534 0.334  2.11e-51 EOS1    Laptm4a 
##  9 2.21e-54      0.749 0.755 0.61   3.26e-50 EOS1    Tiparp  
## 10 2.57e-54      0.706 0.917 0.836  3.80e-50 EOS1    Egr1    
## # ... with 584 more rows
table(markers.0722$cluster)
## 
## EOS1 EOS2 EOS3 
##  261  209  124
table(GEX.seur$Anno1)
## 
## EOS1 EOS2 EOS3 
## 1524 1526  155
DoHeatmap(GEX.seur, features = markers.0722$gene, group.by = "Anno1", group.colors = color.A1) + 
  scale_color_manual(values = color.A1) + 
  scale_fill_gradientn(colors = color.test) 

length(markers.0722$gene)
## [1] 594
length(unique(markers.0722$gene))
## [1] 564
length(markers.0722$gene[1:470])
## [1] 470
length(unique(c(markers.0722$gene[1:470])))
## [1] 470
scales::show_col(colorRampPalette(RColorBrewer::brewer.pal(11,"RdBu"))(256))

scales::show_col(colorRampPalette(RColorBrewer::brewer.pal(11,"RdBu"))(128))

scales::show_col(RColorBrewer::brewer.pal(11,"RdBu"))

sum(rowSums(GEX.seur@assays$RNA@scale.data)==0)
## [1] 12
rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0]
##  [1] "Gm14308"       "Gm35986"       "Gm42531"       "Egr4"         
##  [5] "5430431A17Rik" "1700112J16Rik" "Defa5"         "Gm26714"      
##  [9] "Hepacam"       "Gm48809"       "Gm48940"       "Fitm1"

add some NA markers to build manual row-gaps (unlike pheatmap, Seurat::DoHeatmap no auto-rowgaps parameter setting…)

pp.heat <- DoHeatmap(GEX.seur, features = c(markers.0722$gene[1:261],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][1:2],
                                            markers.0722$gene[262:470],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][5:6],
                                            markers.0722$gene[471:594]), group.by = "Anno1", group.colors = color.A1,draw.lines = T, lines.width = 20, label = T, angle = 30) + 
  scale_color_manual(values = color.A1) + 
  scale_fill_gradientn(colors = rev(colorRampPalette(RColorBrewer::brewer.pal(11,"RdBu"))(256)), na.value = "white") 
pp.heat 

pp.heat1 <- DoHeatmap(GEX.seur, features = c(markers.0722$gene[1:261],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][1:2],
                                            markers.0722$gene[262:470],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][5:6],
                                            markers.0722$gene[471:594]), 
                      group.by = "Anno1", group.colors = color.A1,draw.lines = T, lines.width = 20, label = T, size = 3.5, angle = 25) + 
  scale_color_manual(values = color.A1) + 
  scale_fill_gradientn(colors = rev(colorRampPalette(RColorBrewer::brewer.pal(11,"RdBu"))(256)), na.value = "white") 
pp.heat1 + theme(axis.text.y = element_blank())

GEX.seur@assays$RNA@scale.data[rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][c(1:2,5:6)],] <- NA
PurpleAndYellow <- function(k = 50) {
  return(CustomPalette(low = "magenta", high = "yellow", mid = "black", k = k))
}

pp.heat2 <- DoHeatmap(GEX.seur, features = c(markers.0722$gene[1:261],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][1:2],
                                            markers.0722$gene[262:470],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][5:6],
                                            markers.0722$gene[471:594]), 
                      group.by = "Anno1", group.colors = color.A1,draw.lines = T, lines.width = 10, label = T, size = 3.5, angle = 25) + 
  scale_color_manual(values = color.A1) +
  scale_fill_gradientn(colors = PurpleAndYellow(), na.value = "white") 
pp.heat2 + theme(axis.text.y = element_blank())

RColorBrewer::brewer.pal(11,"RdBu")
##  [1] "#67001F" "#B2182B" "#D6604D" "#F4A582" "#FDDBC7" "#F7F7F7" "#D1E5F0"
##  [8] "#92C5DE" "#4393C3" "#2166AC" "#053061"
BludAndRed <- function(k = 50) {
  return(CustomPalette(low = "#053061", high = "#67001F", mid = "white", k = k))
}

pp.heat3 <- DoHeatmap(GEX.seur, features = c(markers.0722$gene[1:261],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][10:12],
                                            markers.0722$gene[262:470],
                                            rownames(GEX.seur)[rowSums(GEX.seur@assays$RNA@scale.data)==0 & rowSums(GEX.seur@assays$RNA@data)==0][7:9],
                                            markers.0722$gene[471:594]), 
                      group.by = "Anno1", group.colors = color.A1,draw.lines = T, lines.width = 20, label = T, size = 3.5, angle = 25) + 
  scale_color_manual(values = color.A1) +
  scale_fill_gradientn(colors = BludAndRed(), na.value = "white") 
pp.heat3 + theme(axis.text.y = element_blank())

pp.heat$data$gGroup <- pp.heat$data$Feature
#pp.heat$layers
head(markers.0722)
## # A tibble: 6 x 7
##      p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene    
##      <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>   
## 1 5.31e-93      1.38  0.721 0.469  7.85e-89 EOS1    Rgs1    
## 2 1.73e-69      1.43  0.414 0.171  2.56e-65 EOS1    Dio2    
## 3 1.01e-68      0.691 0.913 0.763  1.50e-64 EOS1    Adamdec1
## 4 4.46e-65      1.32  0.45  0.203  6.59e-61 EOS1    Slc40a1 
## 5 1.10e-61      0.877 0.885 0.77   1.62e-57 EOS1    Nabp1   
## 6 1.56e-61      0.873 0.724 0.548  2.30e-57 EOS1    Tspan13
1822848/594
## [1] 3068.768
#saveRDS(GEX.seur,"./scEOS.pure_Anno1_0719.rds")
#GEX.seur <- readRDS("./scEOS.pure_Anno1_0719.rds")

public data

the only available public sc-dataset of mouse SI EOS is GSE182001,
(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE182001)
the processed matrices were downloaded to analyze as https://github.com/TheMoorLab/Eosinophils_scRNASeq for local comparison

on the other hand, the raw fastq files were downloaded to run a locally built BD_WTA pipeline with Nmur1-fixed mm10 version index,
unfortunately, though the Nmur1-signal on extending exons were indeed detected,
none of them remained expressed in filtered EOS cells from any tissues

the public data and local data of scEOS are finally not consistently comparable for some reasons,
which might need further confirmation expecially from a third team maybe

BD data

BD.seur <- readRDS("J:/projects_local/projects/202202_BD_WTA/repeat_analysis/public_scEOS.pre0226.rds")
BD.seur
## An object of class Seurat 
## 29311 features across 14409 samples within 1 assay 
## Active assay: RNA (29311 features, 2000 variable features)
##  3 dimensional reductions calculated: pca, tsne, umap
head(BD.seur@meta.data)
##          orig.ident nCount_RNA nFeature_RNA percent.mt     S.Score  G2M.Score
## 870857_1      Colon      22821         3695   2.943238 -0.12697336 -0.1517395
## 47519_1       Colon      17810         3148   2.677369 -0.05345104 -0.1147854
## 143140_1      Colon      23900         4727   4.710312 -0.01138911  0.2843754
## 232339_1      Colon      18547         3398   3.658405 -0.09734297 -0.1533418
## 247978_1      Colon      16156         2643   4.752769 -0.08533308 -0.1033882
## 273412_1      Colon      17165         3343   2.655795  0.03702872 -0.1667155
##          Phase RNA_snn_res.1.2 seurat_clusters RNA_snn_res.0.8 RNA_snn_res.1
## 870857_1    G1              10               9               8             9
## 47519_1     G1              10               9               8             9
## 143140_1   G2M               6               5               4             5
## 232339_1    G1              10               9               8             9
## 247978_1    G1              10               9               8             9
## 273412_1     S              10               9               8             9
##          sort_clusters      Anno
## 870857_1             9 Macro_pro
## 47519_1              9 Macro_pro
## 143140_1             5     Macro
## 232339_1             9 Macro_pro
## 247978_1             9 Macro_pro
## 273412_1             9 Macro_pro
unique(BD.seur$orig.ident)
##  [1] "Colon"        "SI"           "Spleen"       "Stomach"      "blood"       
##  [6] "bonemarrow"   "stomachHP"    "bloodCR"      "bonemarrowCR" "colonCR"
BD.seur_SI <- subset(BD.seur, subset = orig.ident == "SI" & Anno %in% c("EOSprogenitors",
                                                                        "immatureEOS",
                                                                        "circulatingEOS",
                                                                        "basalEOS",
                                                                        "activeEOS"))
BD.seur_SI
## An object of class Seurat 
## 29311 features across 989 samples within 1 assay 
## Active assay: RNA (29311 features, 2000 variable features)
##  3 dimensional reductions calculated: pca, tsne, umap
DimPlot(BD.seur, reduction = "umap", label = T, group.by = "Anno") +
  DimPlot(BD.seur_SI, reduction = "umap", label = T, group.by = "Anno")

VlnPlot(BD.seur, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0, group.by = "Anno")

VlnPlot(BD.seur_SI, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0.05, group.by = "Anno")

VlnPlot(subset(BD.seur_SI, subset= nFeature_RNA < 2000), features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0.05, group.by = "Anno")

table(data.frame(Anno=BD.seur$Anno,Tissue=BD.seur$orig.ident))
##                 Tissue
## Anno             blood bloodCR bonemarrow bonemarrowCR Colon colonCR   SI
##   activeEOS          2       0          1           11  1040     187  421
##   basalEOS         257      82         36           12   222       7  454
##   circulatingEOS   849     234        178           80    41       7   74
##   immatureEOS       30      80        349          604     7       4   28
##   EOSprogenitors     5       9        495          583     1       3   12
##   Epithelium        17       2          8            3   113       9  179
##   Fibro_SMC_Endo     2       1          4            6    77       8  123
##   Macro              9       0         10            6   437       7  141
##   Neutrophil1        0      15        117          500     9      19   10
##   Neutrophil2        2       4        104          499     1       0    0
##   Neutrophil_pro     0       0         65          152     1       1    2
##   Macro_pro          4       1        187          202    35       6   42
##   Bcell1             0       0        167          113     6       0   16
##   Bcell2            16       3          8           11    40       2   47
##                 Tissue
## Anno             Spleen Stomach stomachHP
##   activeEOS           7     108       287
##   basalEOS          658    2084        43
##   circulatingEOS     67     294        18
##   immatureEOS        13      74         3
##   EOSprogenitors      6      30         2
##   Epithelium         45     185        41
##   Fibro_SMC_Endo     13     106         5
##   Macro              23      94        19
##   Neutrophil1         3       7         9
##   Neutrophil2         1       3         0
##   Neutrophil_pro      2       4         0
##   Macro_pro          13      37         3
##   Bcell1              9       8         0
##   Bcell2             35      76         0
pheatmap::pheatmap(t(table(data.frame(Anno=BD.seur$Anno,Tissue=BD.seur$orig.ident)))[,1:5,drop=F],
                   main = "Cell Count, public BD data",show_rownames = T, legend = F,
                   cluster_rows = F, cluster_cols = F, display_numbers = T, number_format = "%.0f")

pheatmap::pheatmap(100*rowRatio(t(table(data.frame(Anno=BD.seur$Anno,Tissue=BD.seur$orig.ident)))[,1:5,drop=F]),
                   main = "Cell Percentage, public BD data",show_rownames = T, legend = F,
                   cluster_rows = F, cluster_cols = F, display_numbers = T, number_format = "%.2f")

table(data.frame(Anno=BD.seur_SI$Anno,drop=F))
##                 drop
## Anno             FALSE
##   activeEOS        421
##   basalEOS         454
##   circulatingEOS    74
##   immatureEOS       28
##   EOSprogenitors    12
##   Epithelium         0
##   Fibro_SMC_Endo     0
##   Macro              0
##   Neutrophil1        0
##   Neutrophil2        0
##   Neutrophil_pro     0
##   Macro_pro          0
##   Bcell1             0
##   Bcell2             0
pheatmap::pheatmap(t(table(data.frame(Anno=BD.seur_SI$Anno,drop=F)))[,1:5,drop=F],
                   main = "Cell Count, SI EOS",show_rownames = F, legend = F,
                   cluster_rows = F, cluster_cols = F, display_numbers = T, number_format = "%.0f")

pheatmap::pheatmap(100*rowRatio(t(table(data.frame(Anno=BD.seur_SI$Anno,drop=F)))[,1:5,drop=F]),
                   main = "Cell Percentage, SI EOS",show_rownames = F, legend = F,
                   cluster_rows = F, cluster_cols = F, display_numbers = T, number_format = "%.2f")

## BD markers 
markers.EOS <- c("Ptprc","Ccr3","Siglecf","Itgax",
                 "Ear2","Ear6","Epx","Il5ra")

markers.c8 <-  c("Epcam","Krt8","Krt18","Cdh1")

markers.c10 <- c("Cygb","Clec3b","Fn1","Col1a1",  # Fibroblast
                 "Acta2","Cnn1","Myl9","Tpm2", # SMC
                  "Pecam1","Icam2","Cdh5","Esam",     # Endotheliocyte
                  "Kdr","Apold1","Flt1","Ptprb")

markers.c5.9 <- c("C1qa","C1qb","C1qc","Mrc1",   # Macrophage
                                   "Cd74","Cd83","H2-Aa","H2-Eb1",
                                   "Mki67","Top2a","Mcm3","Mcm6")

markers.c11.12 <- c("Cd19","Cd79a","Cd79b","Pax5",
                                   "Rag1","Vpreb3","Ms4a1","Cd22")

markers.c6.7.13 <- c("S100a8","S100a9","Clec4d","Clec7a",
                                   "Mmp8","Mmp9","Top2a","Pcna")


final.markers <- c("Mki67", "Tuba1b", "Epx", "Prg3", "Prg2","Ear1","Ear2", "Ear6",  "Cd63", "Cebpe",
                   "Alox15", "Aldh2", "S100a9", "S100a6", "S100a10", "Il5", "Retnla", "Ccl9", "Il1rl1", 
                   "Cd24a", "Mmp9", "Icosl", "Il4", "Tgfb1", "Pirb", "Rara", "Cd80", "Cd274", "Ptgs2", "Il1rn", "Il1b", 
                   "Vegfa", "Ccl3", "Cxcl2", "Il16", "Tnf")
mapal <- colorRampPalette(RColorBrewer::brewer.pal(4,"Spectral"))(120)
DotPlot(BD.seur, features = unique(c(markers.EOS,
                                      markers.c8,
                                      markers.c10,
                                      markers.c5.9,
                                      markers.c6.7.13,
                                      markers.c11.12
                                      )), group.by = "Anno") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))+ scale_color_gradientn(colors = rev(mapal))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

DotPlot(BD.seur, features = final.markers, group.by = "Anno") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))+ scale_color_gradientn(colors = rev(mapal))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

SI EOS

DotPlot(BD.seur_SI, features = unique(c(markers.EOS,
                                      markers.c8,
                                      markers.c10,
                                      markers.c5.9,
                                      markers.c6.7.13,
                                      markers.c11.12
                                      )), group.by = "Anno") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))+ scale_color_gradientn(colors = rev(mapal))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

DotPlot(BD.seur_SI, features = final.markers, group.by = "Anno") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))+ scale_color_gradientn(colors = rev(mapal))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

localdata

DotPlot(GEX.seur, features = unique(c(markers.EOS,
                                      markers.c8,
                                      markers.c10,
                                      markers.c5.9,
                                      markers.c6.7.13,
                                      markers.c11.12
                                      )), group.by = "Anno1") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))+ scale_color_gradientn(colors = rev(mapal))
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: Epx, Cnn1, Flt1, Ptprb, Clec7a,
## Rag1
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

DotPlot(GEX.seur, features = final.markers, group.by = "Anno1") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))+ scale_color_gradientn(colors = rev(mapal))
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: Epx, Il5
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

check sig score

score.BDSI_Nup <- calculate_signature_score(as.data.frame(BD.seur_SI@assays[['RNA']]@data),
                                          DEG_0608.SI_Nup)
## Summarizing data
score.BDSI_Pup <- calculate_signature_score(as.data.frame(BD.seur_SI@assays[['RNA']]@data),
                                          DEG_0608.SI_Pup)
## Summarizing data
BD.seur_SI <- AddMetaData(BD.seur_SI,
                        score.BDSI_Nup,
                        "score.SI_Nup")
BD.seur_SI <- AddMetaData(BD.seur_SI,
                        score.BDSI_Pup,
                        "score.SI_Pup")
vln.score.BDSI_Nup <- BD.seur_SI@meta.data %>%
    ggplot(aes(Anno, score.SI_Nup, color = Anno)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("activeEOS","basalEOS"),
                                                  c("basalEOS","circulatingEOS"),
                                                  c("basalEOS","immatureEOS")),
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
     #scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI Nmur1_PvsN, Nup") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))

vln.score.BDSI_Pup <- BD.seur_SI@meta.data %>%
    ggplot(aes(Anno, score.SI_Pup, color = Anno)) +
    geom_violin(draw_quantiles=c(0.25,0.75),trim = FALSE) +
    #ylim(c(-0.25,0.35)) +
    ggbeeswarm::geom_quasirandom(size = 0.01, width = 0.2, alpha = 0.3) +
    stat_summary(fun=mean, geom="point", shape=18, size=3, color="black") +
    ggpubr::stat_compare_means(aes(lable = ..p.signif..), 
                               method = "wilcox.test",
                               comparisons = list(c("activeEOS","basalEOS"),
                                                  c("basalEOS","circulatingEOS"),
                                                  c("basalEOS","immatureEOS")),
                               #label.y = c(0.15,0.2,0.15,0.2,0.17,0.1)
                               ) +
    #scale_color_manual(values = color.A1) +
    labs(x="",title="Signature Score of SI Nmur1_PvsN, Pup") +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.background = element_blank(), 
          panel.border = element_rect(fill = NA)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9.6))
#vln.score.SI_Pup
#+ NoLegend()
cowplot::plot_grid(
  vln.score.BDSI_Nup,
  vln.score.BDSI_Pup,
  ncol = 2
)

through marker comparison and reference mapping,
seems like our EOS1/2/3 all expressed some of their activeEOS markers,
and most cells mapped to activeEOS

using signature score,
their activeEOS is more like our Nmur1+ EOS1

also tried re-clustering of SI-EOS on public data,
but still not totally the same, not sure why,
the issue of profiling different tissue subtypes might be caused by several steps like mice/sorting/library/sequencing